Synopsis: Jaywalking
Let's see how using a comma-separated list causes an antipattern.
We'll cover the following
Let’s imagine that you are developing a feature in a bug-tracking application to designate a user as the primary contact for a product. You originally allow only one user to be the contact for each product. Unsurprisingly, however, you are soon requested to support assigning multiple users as contacts for a given product.
At the time, it seems simple enough to change the database to store a list of user account identifiers separated by commas in place of the single identifier that you used before.
Soon, your boss approaches you with a problem: “The engineering department has been adding associate staff to their projects. They tell me they can add five people only. If they try to add more, they get an error. What’s going on?” You nod and say, “Yeah, you can only list so many people on a project,” as though this is completely ordinary.
However, sensing that your boss needs a more precise explanation, you quickly add “Well, five to ten—maybe a few more. It depends on how old each person’s account is.” But your boss begins to raise their eyebrows now and you realize that you can’t simply toss the matter aside, so you continue, “I store the account IDs for a project in a comma-separated list. But the list of IDs has to fit in a string with a maximum length. If the account IDs are short, I can fit more into the list. So, people who created the earlier accounts have an ID of 99 or less, and those are shorter than later IDs, which go past 99.” Your boss frowns. You have a feeling you’re going to be staying late.
Programmers commonly use comma-separated lists to avoid creating an intersection table for a many-to-many relationship. This antipattern is often named Jaywalking, the name given to the act of avoiding an intersection.
Objective: Store multivalue attributes#
When a column in a table has a single value, the design is straightforward: we can choose an SQL data type to represent a single instance of that value —for example, an integer
, date
, or string
. But how do we store a collection of related values in a column?
In the example bug-tracking database, we can associate a product with a contact using an integer
column in the Products
table. Each account may have many products, and each product references one contact, so we have a many-to-one relationship between products and accounts.
As our project matures, we realize that a product may have multiple contacts. Thus, in addition to the many-to-one relationship, we also need to support a one-to-many relationship from products to accounts. One row in the Products
table must be able to have more than one contact.
Legitimate uses of the antipattern#
We may improve the performance for some kinds of queries by applying “denormalization” to our database organization. Storing lists as a comma-separated string is an example of denormalization.
For example, our application may need the data in a comma-separated format and there may be no need to access individual items in the list. Likewise, if our application receives a comma-separated format from another source and we simply need to store the full list in a database and retrieve it later in exactly the same format, there’s no need to separate the values.
The general rule is to be conservative if we decide to employ denormalization. Normally, it is best to start by using normalized database organization because it permits our application code to be more flexible, and it allows our database to help preserve data integrity.